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Abstract. Dendrograms used in data analysis are ultrametric spaces, hence 
objects of nonarchimedean geometry. It is known that there exist p-adic rep- 
resentation of dendrograms. Completed by a point at infinity, they can be 
viewed as subtrees of the Bruhat-Tits tree associated to the p-adic projec- 
tive line. The implications are that certain moduli spaces known in algebraic 
geometry are p-adic parameter spaces of (families of) dendrograms, and sto- 
chastic classification can also be handled within this framework. At the end, 
we calculate the topology of the hidden part of a dendrogram. 



1. Introduction 

Dendrograms used in data analysis are ultrametric spaces. Hence they are ob- 
jects of nonarchimedean geometry, a special instance of which is p-adic geometry. 
Murtagh [19] shows how to associate to a dendrogram a set of p-adic representa- 
tions of integers. This lies well within the tradition of using ultrametrics in order 
to describe the hierarchical ordering in classification (cf. and the references 
therein). 

However, there is seemingly a problem in the choice of the prime number p for 
the p-adic representation of dendrograms by the fact that the geometry of the p-adic 
number field Qp allows only at most p maximal subclusters of any given cluster. 
We will show that this can be overcome by considering finite field extensions of 
Qp, so that the convenient choice p = 2 becomes feasible for any dendrogram. 
This seems to be compliant with the philosophy of allowing any nonarchimedean 
complete valued field for describing, coding or computing in data analysis. We 
acknowledge here our inspiration by [18] . 

Our point of view is in fact of a geometric nature. For a p-adic geometer, a 
dendrogram is nothing but the affine p-adic line with n punctures from which a 
certain kind of covering of can be made whose intersection graph is the tree in 
bijection with the dendrogram from the point of view of data analysis. Completing 
the affine line to the projective line and then taking an extra puncture cxo, allows 
us to see the dendrogram as a subtree of the Bruhat- Tits tree, which is an important 
object in the study of p-adic algebraic curves. A first application is in the coding of 
DNA sequences [9] , which is a special case of p-adic methods for processing strings 
over a given alphabet, as explained in [7], where also new invariants of time series 
of dendrograms are developped. 

It is an imperative from the geometric viewpoint to study families of dendro- 
grams. For these, there exist already parameter spaces. In fact, it is now the 
moduli space of genus curves with n punctures Mo,„ from algebraic geometry 
which now becomes the central object of interest. Each point of the p-adic version 
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of Afo,n is a dendrogram with the extra point oo. It is then a natural consequence 
that a stochastic dendrogram is a continuous family of dendrograms together with 
a probability distribution on it, or, we can make this now more precise, a map from 
a p-adic set of parameters S to Mo,„ with a probability distribution on S. We 
will give an idea of p-adic spaces by explaining the Berkovich topology one has on 
these. Due to the ultrametric property, p-adic spaces in a nai've sense are totally 
disconnected. This problem can be remedied by introducing extra points which 
can, in a generalised sense, be viewed as clusters of usual points. 

In this framework, collisions of points in their evolution through time can be 
formally described by considering the compactification A/o,n by stable trees of pro- 
jective lines which we call stable dendrograms. Time series of dendrograms, on 
the other hand, yield (analytic) maps Mo,m — -^^o,n between the moduli spaces. 
Further applications of these moduli spaces should be in the study of consensus of 
dendrograms. 

We end by calculating the topology of the hidden part of a dendrogram, i.e. 
the subgraph spanned by vertices corresponding to clusters which do not have 
singletons as maximal subclusters. This subgraph determines the distribution of 
the other clusters, which are "near the end" of the dendrogram. 

An introduction to p-adic numbers is [12j . Algebraic curves can be learned 
with a minimum amount of technical requirements in |13j . A bird's eye on moduli 
spaces of curves is found in [171 Appendix: Curves and Their Jacobians] . A broader 
introduction to moduli of curves is [13] . A non-technical introduction to Berkovich 
spaces and analysis on the projective line is contained in [Tl|2]. Those who intend 
an intensive study of these subjects might wish to learn more algebraic geometry 
which can be found in [17j . 



2. Dendrograms and nonarchimedean geometry 



a -I 




Figure 1. A 2-adic dendrogram. 
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Dendrograms are known to be endowed with a nonarchimedean metric, also 
called an ultrametric, for which the strict triangle inequality 

d{x, y) < max {d(x, z), d(z, y)} 

holds. Therefore, it is quite tempting to use p-adic numbers for their description, 
and in fact, this has recently been done [181 US- I shall explain this along the 
example dendrogram of Figure [TJ which is a slight modification of [El Fig. 1] . 
Choose a prime number p, and distribute the p numbers 0, . . . ,p — 1 across the 
partitioning of the horizontal line segments defined by the intersection points with 
vertical line segments of the dendrogram. For the top horizontal line segment, 
one has to introduce one extra vertical line segment going upward|3, as effected in 
Figure [Tj On going down on a path 7 from the top vertical line segment all the 
way down to one of the points Xi, one picks up the numbers a on the traversed 
horizontal line segments £ and obtains 

X ^^a^p", 

V 

where v — v[i~) runs through all levels of the horizontal parts t of the path 7. 
In our example from Figure [H we assume p — 1, and obtain the numbers 

xi =0, xi^ 2^, xz = 2^, x^ ^ 22, 

2:5 = 22 + 24, X6 = 2^ + 2\ XT = 2°, X8^2° + 2\ 

Note that these dyadic representations differ from the ones in [TH §2]. In any 
case, each path from the top to a bottom end of the dendrogram corresponds 
to a p-adic power series representation of an integer number. The choice of the 
prime p is arbitrary. However, it might seem that the possible number of vertical 
segments attached to one horizontal line segment allowing a p-adic representation 
of a dendrogram might be bounded by p. But this is not the case. In fact, one 
can restrict to the arbitrary choice p = 2, if one wishes, and can describe all 
dendrograms by the help of a little algebra, as will be seen in the following section. 

3. The Bruhat-Tits tree 

Let Qp be the field of p-adic numbers. It is a complete nonarchimedean normed 
field whose norm will be denoted by |-|p. Consider the unit disk 

B^{xGQp\ |a;|p<l} = Bi(0). 

It contains the p maximal smaller disks 

Si(0), Si(l),...,i?i(p-1) 

p p p 

corresponding to the residue field ¥p of Qp. This well known fact is actually a 
consequence of the construction from the previous section. 

It is useful to consider the p-adic projective line P(Qp) = Qp U {00}, in which 
there is the maximal disk outside D: 

{xeP(Qp) I \x\p>p}^Bp{(x). 

Due to the ultrametric topology on the p-adic projective line, the "closure" of an 
"open" disk depends somewhat on the choice of a point on its "boundary" [TOl 
§1.1]. Therefore, we make 



The usefulness of this extra detail will become apparent in the following sections. 
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Figure 2. The Bnihat-Tits tree for Qs- 

Definition 3.1. Let 

B ^{x e P(Qp) I \x - a\p < r} {resp. B = {x e P(Qp) | \x - a\p > r}) 

for some a G Qp and a p-adic value r = \e\p, e G Qp \ {0}, and let b € Qp such that 
\a — b\p = r. The affinoid closure of B with respect to oo (resp. to b) is the disk 

B = {zeF{Qp)\\x-a\p<r} {resp. B = {z e P{Qp) \\x - b\p > r}). 

Using the projective Hne necessitates the introduction of an equivalence relation 
on the set of all disks of P(Qp). Namely, disks Bi, B2 are said to be equivalent: 
Bi ^ B2, if either Bi — B2 or the affinoid closure of P(Qp) \ B2 with respect to 
some point a E B2 equals Bi [TSl §1]. One checks that the relation ~ is indeed an 
equivalence relation. 

The Bruhat-Tits tree is defined by setting its vertices to be the equivalence 
classes of disks in P(Qp), and its edges are given by maximal inclusion of disks, 
i.e. an edge e = [-B2]) means that Bi is strictly contained in B2, and Bi is 

a maximal disk with this property, for suitable representative disks. It is a well 
known fact that is indeed a tree. This can be seen directly in this way: Each 
class is obviously represented by a unique disk B which is the closure with respect 
to 00 ^ _B, and the disks not containing infinity are preordered by inclusion; so J^p 
is a directed acyclic graph, hence a tree by the ultrametric property of |-|p. 

The star of a vertex v in =3^^, denoted as Sta.v^^^(v)^ consists of all edges em- 
anating from V. The edges of any star are in one-to-one correspondence with the 
points of P(Fp) = Fp U {cxo}, i.e. the Fp-rational points of the projective line over 
the residue field Fp. Namely, this is true for the vertex vn corresponding to the unit 
disk D, and the group of Mobius transformations acts on ^ [151 Bemerkung 5]. 
Thus the Bruhat-Tits tree is a p -f 1-regular locally finite tree. An illustration 
of =3^2 from p Fig. 5] is given in Figure O 

By construction, the tree is invariant under transformations of the form 
z ^ cz+d ^ with a,b,c,d G Qp such that ad — be ^ 0. These transformations are 
called projective linear or Mobius transformations, and form the group PGL2(Qp). 
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The reason for invariance under PGL2(Qp) is the well known fact that Mobius 
transformations take equivalent disks to equivalent disks. 

As it may happen that a cluster may have more than p maximal subclusters, it 
would be convenient to be able to represent such dendrograms without enlarging 
the prime p. So, let K D Qp he a finite extension field of Qp. The p-adic norm 
extends, similarly as in the archimedean case, uniquely to an ultrametric norm \ -\k 
on and K is complete with respect to \-\k- Such a field K is called a p-adic 
field. 

For a p-adic field there is in a similar manner as for Qp a Bruhat-Tits tree ^k- 
Again K has a finite residue field with q = p™ elements, and 5^ is g + 1-regular. 
Therefore, in practical applications it should be possible to stick to the prime p = 2 
and make finite field extensions, if there are clusters with more than 2 children 
clusters. Again, PGL2(-Rr) respects the symmetries of the hierarchical structure of 
the Bruhat-Tits tree, i.e. is invariant under projective linear transformations 
defined over K. 

For convenience, we assume now that K — Qp. However, all what is said in the 
following is valid also for arbitrary p-adic fields. 
It is well known that any infinite descending chain 

(1) B1DB22... 

of strictly smaller disks in P((Qp) converges to a unique point 

n 

on the p-adic projective line P(Qp). A chain ([1]) defines a half line in the Bruhat-Tits 
tree ^q^. 

An end in a tree is an equivalence class of halflines, where two halflines are said 
to be equivalent, if they differ only by finitely many edges. It is a fact that the 
ends of the tree ^ correspond bijectively to the points in P(Qp), and is not too 
difficult to check. 

The following subtree of the Bruhat-Tits tree is an idea of F. Kato Ti5, §5.4] 
which turned out useful in the study of discontinuous group actions: 

Definition 3.2. Let X C P(Qp) be a finite set containing 0, 1 and 00. Then the 
smallest subtree £^*{X) of having X as its set of ends is called the projective 
dendrogram for X . 

Note that the definition of ,3'^*{X) makes sense, even if X does not contain 0, 1 
or 00. 

Example 3.3. (1) Let xq, xi E P(Qp) be two distinct points, and set X — {xq, xi}. 
It defines the subtree ^*{X) which is a straight line: the geodesic in 3q between 
Xq and Xl, as illustrated in Figure [31 

a;o ^ ^ Xl 



Figure 3. Geodesic line in ^ 
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(2) Let X — {xq, xi,X2] be a set of three mutually distinct points in P((Q)j,). Then 
the subtree .S/'*{X) is a tripod, as depicted in Figured] We denote by v{xq,xi,X2) 
the unique vertex of ,'7*{X) whose star has three edges. 



X2 




Xo Xi 

Figure 4. Tripod in ^q^. 

For a subset X of P(Qp), define 3^{X) to be the subtree of that is the 
smallest subtree among all possible subtrees containing the vertices of the form 
v{xo, Xi,X2) with a^o, xi,X2 £ X. Notice that this subtree is non-empty if and only 
if X contains at least three points. We call 9' {X) the finite part of the projective 
dendrogram S/'*{X). We have the obvious inclusion 

.J[X) .9*[X) 

of trees. 

It is useful to not take into account all vertices of the finite part — {X) ai 
a projective dendrogram. Consider all paths 7 = [w, w] (without backtracking) of 
maximal length in 3" whose vertices in (v, w) have no edges outside 7 emanating 
from them. By replacing every such path 7 of .^J' by a single edge, but of equal 
length as 7, we obtain a so-called stable tree 3^*-^^^ whose vertices have the property 
that at least three edges emanate from each of them. The tree ^^^'^^ is called the 
stabilisation of 3. 

Convention 3.4. By a (projective) dendrogram = ,'^*{X) we will usually mean 
the tree obtained by identifying the finite part 3{X) with its stabilisation ^^'^b 

A vertex v of 9{X) is considered to be a cluster of the points corresponding to 
the halflines in 9 {X)* emanating from v. Fixing the points 0, 1 and 00 is done for 
reasons of normalisation: two points define a geodesic, three points define a unique 
vertex in , and the three points 0, 1 and 00 define the vertex vn corresponding 
to the unit disk D. 

In this way, the usual dendrogram obtained from ,'7*{X) is 

J'*{X) \ the halfline (ud, 00). 

A "genuine" dendrogram has the property that X C Z U {00}, or, more generally, 
00 ^ X £ X has a finite expansion 

a; = ao -I- aivr + • • • + amTr™, G {0, . . . , g - 1}, 

where tt is a prime element of Ok = {z ^ K \ \z\k < ^} , and q the order of the 
residue field of K (cf. [12l §5] for more details on finite field extensions of Qp). 
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Remark 3.5. As noted in 6 , the task of hierarchical classification conceptually 
becomes the finding of a suitable p-adic encoding which reveals the inherent hierar- 
chical structure of data. The reason is that the p-adic dendrogram ^*{X) of a given 
set X C P"'^(Qp) is uniquely determined by X. Algorithmically, the computation of 
£^*{X) is much simpler than its classical counterpart [7, §3.2]. 

4. The space of dendrograms 

Call Afo,n the space of all projective dendrograms for sets of cardinality n > 3. 
This space is known also under the name moduli space for genus curves with n 
punctures. The term "genus curve" means nonsingular projective algebraic curve 
of genus 0, i.e. projective line. By fixing n points xi, . . . ,Xn on the projective line 
P((Q)p) and then changing these points by a Mobius transformation such that the 
first three are 0, 1, oo, we obtain a projective dendrogram. 

As moduli spaces parametrise objects up to isomorphism, and isomorphisms of 
punctured curves send punctures to punctures, we indeed have a moduli space Afo,n 
of dendrograms by considering in each isomorphism class a normalised representa- 
tive. 

It is a well established fact that 

A/o,„ = (P^\{0,l,oo})""'\A, 

where A is the fat diagonal given by Xi — Xj, i ^ j, and is the projective line, 
considered as an algebraic variety [17l Appendix: Lecture II] . 

One may imagine the space Mp^n by fixing three points on P^ and letting the 
remaining n — 3 points vary on the projective line without collision. 

In the p-adic setting, a family of dendrograms for n points is given by a map 
S —>■ Mg^n from some base space S. Each point s G S* corresponds to a dendrogram, 
and the dendrogram varies in some sense, as s moves along S. 

The "geography" of Mo.„ is as follows: pick a dendrogram x for n points. Moving 
the points only slightly does not change the finite part of the dendrogram. Moving 
the points a little more results in changes in the lengths of the edges of x, but the 
underlying combinatorial structure does not change. The combinatorial tree of x 
occupies an open subset U of Mo,„. Moving points of x even more results in edge 
contractions: by contracting one edge, x moves from J7 to a neighbouring piece 
V. Mo,„ is covered by such disjoint open pieces, each belonging to a combinatorial 
tree with n ends. This is due to the fact that Mo.„, like many spaces in nonar- 
chimedean geometry, is totally disconnected. This rather uncomfortable fact can be 
remedied by either resorting to a so called Grothendieck topology or by introducing 
extra points which then produce a genuine topology (e.g. by considering Berkovich 
analytic spaces This topology will be explained in the following section. 



cxD 00 oo 00 




Figure 5. Dendrograms representing Mq 4. 
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Figure [5] illustrates the dendrograms represented by the different parts of Mqa: 
one "central" region v (three children) and three "outer" regions A, B, C (at most 
two children). Any path from ^ to _B or C passes through v, as the edge has to be 
contracted and then blown up in a different manner. 

5. The Berkovich topology on Mo,„ 

We begin with the topology on the unit disk D of a p-adic field. The classi- 
cal points of D are its iiT-rational points. However, Berkovich defines in [3] more 
points which correspond to multiplicative seminorms on the algebra of power series 
convergent on nonarchimedean spaces. For the unit disk this amounts to [3^, f .4.4]: 

(1) the classical points, 

(2) the disks {x e K \ \x - a\K < r} in D with r = \e\K, ee K\ {0}, 

(3) the disks as in (2), but < r \e\K for any e £ K, 

(4) the properly descending chains Bi Z) B2 . . . of disks in D with P| = 0. 
The new points corresponding to (2), (3) or (4) are called generic, or generic 

Berkovich points. This works also for the affine line where one takes the mul- 
tiplicative seminorms on the polynomial ring K\T] and obtains similarly the types 
(1) to (4) of points. The analogous result holds for the projective line. 

The concept of generic Berkovich points via multiplicative seminorms works also 
in higher dimension, and the result is that p-adic manifolds are locally contractible 
[4]. In any case, by that concept, the data domain can be viewed as a contiunuum. 

Endowing our space of dendrograms Mo,„ with the Berkovich topology gives 
us now a framework for considering continuously varying families of dendrograms. 
For example, a stochastic classification of n points (including 00) is nothing but a 
probability distribution on Afo,„, possibly with compact support. Or the problem 
of adding a new datapoint to a given classification x G Afo,„ means finding a 
probability distribution on the fibre 7r~^(a;), where tt: Afo.n+i — > Mo,n is the map 
which forgets the {n + l)-th puncture on the p-adic projective line. A similar thing 
applies also to a family S Mo^m where a distribution has to be found on the 
fibre product S Xmo,„ Ma.n+i with the map tt. 

6. Allowing collisions 

So far, our dendrograms for n points can vary continuously in families, but col- 
lisions of points are strictly excluded. In order to allow collisions, one compactifies 
the space Mo,n to Mo,n- We call the points of dMo,n{Qp) stable trees of dendro- 
grams or, by abuse of language, simply stable. In fact, these are the so-called stable 
n-pointed trees of projective lines Such are algebraic curves C which are unions 
of projective lines L together with n points X ~ {xi, . . . , Xn} Q C and have the 
defining properties: 

(1) every singular point is an ordinary double point, 

(2) the intersection graph of the projective lines L is a tree, 

(3) every projective line L of which C is composed contains at least three points 
which are either singular points of C or lie in X, 

(4) X consists of regular points of C. 

In some sense, we can view the points of the boundary (?Mo,ra(Qp) as dendrograms 
of dendrograms. We indeed have such applications in mind as classifications of 
classifications. 
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In order to understand what happens if a dendrogram x € A/o,n moves to the 
boundary, consider a dendrogram with four distinct ends 0, 1, cxd, A, considered as 
points on the projective Une L. The effect of A moving towards one of the other 
three points x is that, upon colhsion, another projective Une L' is formed which 
intersects the original hne L and on which A and the point x are again distinct. 
Such a configuration corresponding to a point of OMqa is given in Figured In any 
case, the resulting tree of dendrograms is indeed stable also for n > 4. 




Figure 6. A stable 4-pointed tree of projective lines. 

Note that the tree with ends corresponding to a stable dendrogram does geomet- 
rically not differ from a projective dendrogram in Mo,n, if one forms a dendrogram 
for the punctures on each of the projective lines. The difference is that differ- 
ent parts of that tree correspond to different projective lines. This is useful for 
distinguishing points which are otherwise identified by collisions. 

7. Finite families of dendrograms 

Assume a finite family X of datasets A"i , . . . , Xm each consisting of n (classical) 
points of the p-adic projective line: 

Xj — {xij, . . . , Xnj^^ 

and assume at the moment that they are all different. For example, X could be a 
time series Xi{tj) = Xji of positions of n not colliding particles never at the same 
place. Thus X is the union of the Xi and represents an element of Mo^mn, if we 
assume 2:11 — 0, X12 = 1 and 2:13 = 00. By restricting to the points of Xj (e.g by 
taking the points at time tj), we obtain a map 

TTj {X) : Mo^mn —* Mo^n 

which is the composition of the two maps 

(2) (0, l,oo,a;i4,...,x„„) 1-^ {xij, . . . ,Xnj), 

(3) (xi, . . . , a;„) (0, 1, 00, x'^,..., x'J, 

i.e. the canonical projection onto Xj followed by a Mobius transformation a G 
PGL2(if) (cf. Section [3]) which sends the first three points of Xj to 0, 1, and cxd. 
Note that the Mobius transformation a ~ ax is uniquely determined by X and 
can be easily computed. 

If we now allow collisions of datapoints, then we obtain a map 

nj{X): Afo.mn — > -Mo,„, 
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which we will not make explicit. Instead we note that if the number of distinct 
points of X is /c, then we have maps as before 

where rij is the number of distinct points in Xj. The Trj{X) are again canonical 
projections followed by Mobius transformations, and are closely related to the maps 
7:,{X). 

The advantage of this moduli space approach to finite families lies in the feasibil- 
ity of handling situations where one has a continuous family of such X. Moreover, 
the Mobius transformation ax varies continuously with X. 

Again, as in Section [S] one can enrich the families by probability distributions 
in order to obtain stochastic classifications. 



8. Hidden vertices 

Definition 8.1. Let = .^*{X) be a projective dendrogram for X. A vertex v 
oj Sr = .J{X) IS called hidden, j/Star,^(i;) = Star^.(i;). The subgraph of ST 
spanned by all its hidden vertices is called the hidden subgraph of . 

The quantity 6q, defined as the number of connected components of F'', mea- 
sures how the clusters corresponding to non-hidden vertices are spread. As T'' is a 
subgraph of a tree, this number equals also the Euler characteristic x(r''). 

Definition 8.2. Let v be a vertex of a graph T. The number ordr(w) = #Starr(2^) 
is called the order of v in T. //ordr(w) = 1, then v is called a tip ofT. 

By our convention, any vertex w of a dendrogram has order either 1 or greater 
than 2. 

Theorem 8.3. Let S'* — ,'^* {X) be a (projective) dendrogram with ^X — n. 
Then = ^Vert(r'') is bounded from above: 

4 

Proof. Case: F'' connected. If F'' is connected, then either 6g 1 or F'* = 0. We 
have for the number t^ of tips of F'' : 

(4) At'' < n, 

because each tip v in F'' must have at least two edges in ^ \ T'\ and, again for 
reasons of order, there must be at least two ends in ^* emanating from each edge 
in Star5'(w) \ Starph(?;). This is illustrated in Figure [71 where u is a tip in F'', and 
e the unique edge in Starph (v) . 

Now, the order in F'' of any vertex w is 0, 1 or > 3. In the first case, t'^ = 0, and 
then 



n n 
1 < - < 



6-4' 

where the first inequality follows in a similar way as ([4]). Assume now that F'' has 
an edge. Then 



which is the bound in case 6g = 1. 
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Figure 7. A hidden tip in a projective dendrogram. 

General case. In the general case, we have 

because for each further connected component of there must be a path from a 
tip of one component to a tip of another in ^{X), consisting of vertices from which 
ends of ^*{X) emanate. This proves the theorem, whether i'' > or not. □ 

Corollary 8.4. For X with n — ^X , there is a bound for the number of connected 
components of F'' ; 

- g 

Proof. We may assume that F'* contains no edges. Then 6q = v'\ and 

~ 4 

from which the asserted bound follows. □ 

The bound in Corollarv l8.4l is not sharp, however. If, for example, F'* is connected 
and not empty, then n must be at least 6. But 

, 6 + 4 

Theorem 8.5. For the number of connected components ofT^, there is the follow- 
ing sharp bound: 



&o< 3 , 



where n is the cardinality of X . 



Proof. We may assume that F'* has no edges. By an inductive glueing of trees 
as in Figure [5] we obtain that for each additional connected component, one has 
to subtract three ends, in order to produce a dendrogram having as few ends as 
possible. Thus, 

, n + 3{b'^-l) _n-3 b'^ 

6 ~~^ + y' 

from which the bound follows. Now, if n is a muhiple of 3, then bi^ = hy 
construction. Therefore, in the general case. 
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Figure 8. Glueing trees along a vertex and removing three ends. 

can be constructed. This means that the bound is sharp. □ 

9. Conclusion 

We have given a geometric foundation for an ultrametric approach towards clas- 
sification. By extending usual dendrograms by an additional point oo, they can be 
considered as points of the moduli space Mo,„ for the projective line with n punc- 
tures. The Berkovich topology allows to consider stochastic classification as giving 
a continuous family of dendrograms with a probabiliy distribution on it. The points 
on the boundary of Mo,„ arise from collisions of continuously evoloving datapoints 
and are interpreted as dendrograms of dendrograms. Time sections of time series 
are given by maps Mo^m — > -Mo.n- Finally, the topology of dendrograms is studied, 
resulting in bounds for the number of hidden vertices and the Euler characteristic of 
the hidden graph which separates those clusters containing datapoints as maximal 
subclusters. The consequence of using p-adic methods is the shift of focus from 
imposing a hierarchic structure on data to finding a p-adic encoding which reveals 
the inherent hierarchies. 
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